Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --sam-omit-prim-seq #458

Closed
wants to merge 2 commits into from
Closed

Conversation

sfiligoi
Copy link
Contributor

Add --sam-omit-prim-seq, with the same semantics as --omit-sec-seq but operating on primary alignments.

Addresses #457

@sfiligoi
Copy link
Contributor Author

@ch4rr0 Could you please review?

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Jan 11, 2024

Hello Igor, I will take a look today.

Copy link
Collaborator

@ch4rr0 ch4rr0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with the changes. My only question is why do this in bowtie2 as opposed to an AWK script, for example?

@sfiligoi
Copy link
Contributor Author

The resulting output file is huge, putting a lot of strain on the IO system.
Reducing the IO cost at the source would be highly preferred.

@sfiligoi
Copy link
Contributor Author

CC @wasade

@wasade
Copy link
Contributor

wasade commented Jan 11, 2024

Thanks, @sfiligoi!

@ch4rr0, shaving IO natively within bowtie2 would be pleasant

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Jan 11, 2024

@BenLangmead, thoughts?

@sfiligoi
Copy link
Contributor Author

Just a reminder....

@BenLangmead
Copy link
Owner

I think this kind of straightforward postprocessing is best left to awk and similar tools. Otherwise we accumulate too many command-line options that make later changes trickier.

I know that this is in tension with the fact that Bowtie had the --suppress option for this purpose: https://bowtie-bio.sourceforge.net/manual.shtml#bowtie-options-suppress. But I think keeping it simple is key.

@sfiligoi
Copy link
Contributor Author

Unfortunately, --suppress does not work with -S/--sam.

@BenLangmead
Copy link
Owner

Correct

@wasade
Copy link
Contributor

wasade commented Jan 18, 2024

Hi @BenLangmead, this option is valuable to our efforts with Qiita (https://qiita.ucsd.edu/). Qiita right now houses .sam output from 50-100k metagenomic samples, which are typically mapped against a few databases. The volume of data overall is large, and reprocessing occurs periodically. We currently post process to reduce storage burden, but it would be an appreciable runtime improvement to avoid the significant IO needed to stage .sam temporarily for filtering.

@BenLangmead
Copy link
Owner

I appreciate your comments; I suggest awk or mawk or similar should be a good expedient, or feel free to use a fork with your change. We do not plan to integrate this feature into the master branch.

@sfiligoi sfiligoi closed this Jan 18, 2024
@wasade
Copy link
Contributor

wasade commented Jan 18, 2024

Thanks, @BenLangmead! We appreciate the follow up, and all of incredible work that has, and continues, to go into bowtie2!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants